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Abstract 

HIV-1 protease represents an appealing system for directed enzyme re-design, since it has various different endogenous 
targets, a relatively simple structure and it is well studied. Recently Chaudhury and Gray (Structure (2009) 17: 1636-1648) 
published a computational algorithm to discern the specificity determining residues of HIV-1 protease. In this paper we 
present two computational tools aimed at re-designing HIV-1 protease, derived from the algorithm of Chaudhuri and Gray. 
First, we present an energy-only based methodology to discriminate cleavable and non cleavable peptides for HIV-1 
proteases, both wild type and mutant. Secondly, we show an algorithm we developed to predict mutant HIV-1 proteases 
capable of cleaving a new target substrate peptide, different from the natural targets of HIV-1 protease. The obtained in 
silico mutant enzymes were analyzed in terms of cleavability and specificity towards the target peptide using the energy- 
only methodology. We found two mutant proteases as best candidates for specificity and cleavability towards the target 
sequence. 
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Introduction 

Proteases represent a class of enzymes ubiquitous in all living 
organisms, with multiple applications in industry and biotechnol- 
ogy research [1-3]. There is thus interest in designing new 
proteases capable of cleaving specific peptide sequences [4] . HIV- 
1 protease (PR) represents an attractive starting structure for 
directed enzyme re-design, since it is known to cleave a variety of 
sequences. PR is the enzyme responsible for processing the gag - 
pol fusion polyproteins of the HIV virus [5]. PR is an aspartic 
protease [6-8] and is a homodimer where each chain is composed 
of 99 residues. Wild type PR (WT-PR) is very specific for the 
endogenous cleavage sequences of the polyprotein (endogenous 
substrate peptides, Table SI in File SI), even if the source of this 
specificity is still not completely clear. A series of other non- 
endogenous peptides have also been found to be cleaved by PR. 
The latest hypothesis on the origin of this specificity, called 
dynamic substrate envelope [9,10], states that peptides fitting into 
the protease cavity through a certain number of hydrogen bonds 
will be bound and possibly cleaved nearly regardless of their amino 
acid composition. In fact, there is no clear trend in amino acid 
sequence (e.g. a negatively charged amino acid in position PI or a 
hydrophobic one in position P2'). This suggests that with few 
mutations PR could be made to cleave other target peptide 
sequences in a specific manner. 

Many computational studies on PR, both wild type (WT) and 
drug resistant mutant enzymes, are aimed at elucidating the 
affinity of the enzymes towards endogenous substrates and 
inhibitors to be used as drug candidates [11-14]. Recently 
Chaudhury and Gray [15] published a computational algorithm 
specifically tailored for PR and aimed at the identification of the 



specificity determining residues. The algorithm is based on 
PyRosetta [16], a python script-based interface to Rosetta [17]. 
Thanks to the algorithm the authors were able to predict accurate 
protease - substrate complex structures (within 1.1 A rms of the 
corresponding crystal structure) and introduced an energetic 
discrimination of cleavable peptides. More recendy Alvizo et al. 
[18] employed computational methods to re-engineer a mutant 
PR (Pr3) more specific for one of the endogenous peptide 
sequences over two others. 

The first aim of this study is to develop an energy-only based 
methodology to discern cleavable and non cleavable peptides for 
PRs, WT and mutant. This methodology is based on the 
qualitative evaluation of PR: peptide complexes binding energies 
and is derived from the algorithm developed by Chaudhury and 
Gray. The second aim is to search and define an algorithm to 
predict mutant PRs capable of cleaving a specific target peptide 
sequence different from any endogenous substrate. We use our 
cleavability discerning methodology on the suggested mutant 
proteases, in order to define the best guess in terms of specificity 
towards the peptide sequence. In other words, the sought after 
mutant structure has to show better and worse binding towards the 
target and endogenous peptides, respectively, than WT-PR. To 
the best of our knowledge ours is the first study aimed at predicting 
a mutant PR capable of cleaving specifically a non endogenous 
peptide sequence. 

The paper is organized as follows: first, we present our 
computed binding energies for known cleavable and non-cleavable 
peptides bound to WT-PR, selected peptides bound to a set of 
single, double and triple mutants (Pr3 set) derived from Pr3 as 
developed by Alvizo et al., and a set of known mutant PRs and 
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Table 1. Computed binding energies of WT-PR and cleavable endogenous peptides. 



Substrate Peptide 


FMO (kcal/mol) 


RosettaDock (kT) 


exp AT,,, ImMI [22] 


MA-CA 


— 57 


-9 


0.15 


CA-p2 


-52 


-6 


0.01 


p2-NC 


-72 


-4 


0.05 




— 68 


— 3 




p1-p6 


-47 


1 




p6pol-PR 


-41 


-6 




TF-PR 


-62 


-5 


<0.01 


PR-RTp51 


-64 


-7 


0.07 


RTp51-RTp66 


-68 


-12 


0.04 


RTp66-INT 


-62 


-6 




RH-IN 


-63 


-10 


0.006 



doi:1 0.1 371 /journal.pone.0095833.t001 



peptide derived from drug resistance (DR set) studies [19-21]. 
Secondly, we present two different versions of our algorithm to 
determine mutant PRs that will cleave the sequence 
HFLSF*MAIP, where the * symbol indicates the desired cleaving 
site. A discussion about the best strategy to suggest mutant 
enzymes follows. The conclusions summarize the main findings of 
the paper, followed by a detailed description of the employed 
computational methods. 

List of Abbreviations 

PR HIV-1 protease. 

WT-PR Wild type HIV-1 protease. 

mutant PR Mutant HIV-1 protease. 

Pr3 set Set of mutant proteases derived from Pr3, a mutant 
protease developed by Alvizo el al. These were heterodimer 
proteases. 

DR set Set of mutant proteases derived as a subset of HIV- 1 
proteases that have been found to be drug resistant. These were 
homodimer proteases. 

Results and Discussion 

Development of a Cleavability Test 

In general, the activity of an enzyme towards two similar 
substrates is regulated by (i) the strength of the enzyme-substrate 
binding and (ii) the efficiency of the enzymatic reaction. The two 
processes are regulated by two constants, usually indicated as k m 
and k cat , respectively. The overall enzymatic efficiency is given by 
the ratio of these two constants. The dynamic substrate envelope 
hypothesis [10] suggests that if a peptide is bound to PR it will be 
cleaved. Thus, we decided to evaluate the binding energy of 
different peptides to PR, which can be correlated to k m . We then 
compared the computed binding energies to PR of known 
cleavable and non cleavable peptides, to be correlated to 
corresponding ranges of binding energies. By so doing we 
disregarded k cal , that is we did not consider possible effects from 
the enzymatic reaction. 

The cleavability test was developed by considering binding 
energies of WT-PR with its endogenous and known cleavable 
substrates and known non-cleavable peptides. Afterwards we 
investigated the reliability of the test with mutant PRs (the Pr3 set) 
when binding PR endogenous substrates. Finally, we assessed the 
test on mutant PRs (the DR set) when binding mutant substrates. 



The complete methodology for evaluating binding energies is 
described in the computational methods section. In brief, it is 
composed by a structure optimization algorithm, followed by an 
energetic re-evaluation of the obtained structures. In the following 
paragraphs we evaluate our methodology in terms of binding 
energies versus cleavability for: (1) WT-PR and its endogenous 
substrates and known cleavable and non-cleavable peptides, (2) the 
Pr3 set of mutant PRs and endogenous substrates and (3) the DR 
set with wild type and mutated endogenous peptides. Binding 
energies were computed also for WT-PR and all mutant PRs in 
complex with octa-alanine (poly-Ala) and octa-arginine (poly-Arg) 
peptides to test for aspecific binding. 

(1) Table 1 reports the computed binding energies of the set of 
known cleavable endogenous peptides of WT-PR. The sequence 
of the tested endogenous cleavable peptides is reported in Table 
SI in File SI. Alongside the endogenous peptides, a set of 59 
known cleavable peptides was also tested. The sequence of the 59 
tested non-endogenous cleavable peptides was obtained as 
previously described [15,22-26]. Table S5 in File SI reports the 
computed binding energies to WT-PR and Table S2 in File S 1 the 
sequences of these non-endogenous peptides. Table S6 in File SI 
reports the computed binding energies of a set of peptides 
supposedly non-cleavable by WT-PR. The sequence of the 43 
tested non-cleavable peptides was obtained as previously described 
[15,26,27] and is reported in Table S3 in File SI. 

We performed a Mann-Whitney's U test [28] to compare the 
computed binding energies, and found a significant difference 
between the cleavable and non-cleavable sets (p «10~ 7 ), as 
reported in Table 2. Thus, we deemed the binding energy 
criterion sufficient to achieve discrimination. We further analyzed 
the computed binding energies through an ROC plot [29] relative 
to different cutoff values, so as to differentiate between cleavable 
and non-clevable peptides. The plot is reported in Figure 1, and 
the relative data in Table S 1 1 in File S 1 . The computed area 
under ROC [30] is 0.79 and 0.80 for FMO and RosettaDock 
energies, respectively, being the values of 0.50 and 1.00 typical 
correspondingly of a useless and a perfect test. Through the ROC 
plot, we found the best cutoff values discerning cleavable and non- 
cleavable peptides as those closest to (0, 1), which represents the 
theoretical perfect test. We found that cutoff values of —25 kcal/ 
mol and —3 kT are best at discerning FMO and RosettaDock 
computed binding energies, respectively. Both FMO and Rosetta- 
Dock perform well in computing binding energies capable of 
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discerning cleavable and non-cleavable peptides. However, Figure 
SI in File SI shows that there is no apparent correlation between 
FMO and RosettaDock computed binding energies. Thus, we 
repeated the Mann- Whitney's U test and ROC analysis excluding 
the set of non-endogenous known cleavable peptides binding 
energies. The rationale behind this analysis is that we expect WT- 
PR to bind the endogenous peptides with higher affinity, as 
opposed to the broader range of the complete cleavable set, 
characterized only by cleavability and not specificity. Consequent- 
ly, we assume that the endogenous peptides set have better binding 
energies, than the complete set of cleavable peptides. The Mann- 
Whitney's U test (Table 2) shows that the RosettaDock based 
binding energies are in this case two orders of magnitude worse 
than FMO at discerning cleavable and non-cleavable peptides. 
The relative ROC plot (Figure S2 in File S 1) shows as well that the 
FMO data performs better than RosettaDock, in terms of more 
strict best cutoff value and larger area under the ROC. Thus, we 
concluded that FMO computed binding energies are better than 
RosettaDock ones since are capable of discerning expected effects, 
such as the usage of a better performing subset of peptides. In the 
rest of this paper we will discuss only binding energies computed 
through FMO energy re-evaluation. 

From Table 1 it is expected that WT-PR exhibits qualitatively 
different binding to the poly-protein substrates, given their 
computed binding energies ranging from —41 for the binding 
of p6pol-PR to —72 kcal/mol for p2-NC, with an average value of 
— 60 kcal/mol. However, available experimental K m values [22] 
do not show any trend similar to the computed data. Still, one has 
to remember that these computed binding energies should be 
considered only qualitatively and only compared to others 
obtained in the same manner. See the Computational Methods 
section for further details. Furthermore, the span of both 
computed energies for which experimental data are available 
(20 kcal/mol) and the K m values (2 orders of magnitude) is too 
small to allow a clear trend. The computed binding energies for 
the set of cleavable non-endogenous peptides (Table S5 in File SI) 
span a wide range of values, from —2 to —86 kcal/mol, with 
average —40 kcal/mol. These peptides not being the natural 
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Figure 1. ROC plot comparing different cutoff values for 
binding energies computed through FMO energy re-evalua- 
tion or RosettaDock energy function. The values for each method 
closest to the theoretical optimum (0,1) are highlighted. The computed 
area under the ROC curve is 0.79 and 0.80 for FMO and Rosetta, 
respectively. The raw data is reported in Table S11 in File S1. 
doi:10.1371/journal.pone.0095833.g001 



target of WT-PR may account for this large span. The average 
computed binding energy for all cleavable peptides is —43 kcal/ 
mol. The computed binding energies for the non-cleavable set of 
peptides (Table S6 in File SI) span an even wider range of values 
than those of the cleavable ones. Some PR - peptides complexes 
show positive energies. The majority (56%) of the computed 
binding energies are in the range —35-0 kcal/mol. However, a 
few peptides show a binding energy to WT-PR similar to those of 
the cleavable peptides. 

(2) Recently Alvizo et al. [18] suggested through computational 
means a triple mutant (Pr3) with increased binding capability 
towards the endogenous RTp51-RTp66 cleavage sequence 
peptide compared to that towards other two cleavage sequences 
CA-p2 and p2-NC. The efficiency of Pr3 in cleaving preferentially 
RTp5 l-RTp66 was later experimentally verified. Pr3 was made by 
tethering a mutated chain of protease (A28S, D30F, G48R) to a 
wild type one. For comparison with our predicted mutant PRs, 
Table 3 reports our computed binding energies for the Pr3 three- 
fold mutant, as well for simpler one- and two-fold mutant PRs 
derived from Pr3 (Pr3 set), as compared to WT. Note, however, 
that experimental data are available only for the three-fold mutant 
PR. In our calculations, Pr3 set carried mutations only on chain A, 
while still being formed by two separate chains. We expected to 
find that Pr3 computed binding would be stronger towards 
RTp51-RTp66, while weaker towards CA-p2 and p2-NC, 
compared to WT-PR. The computed binding energies of the 
Pr3 set show that the mutant enzymes often have higher affinity 
for the desired RTp51-RTp66 peptide compared to CA-p2 and 
p2-NC. Most notably the double mutant A28S/G48R has a 
stronger computed binding energy towards the target peptide than 
WT-PR, while lower for the other two endogenous substrates. The 
binding energy test indicates that A28S/ G48R (for which there is 
no experimental data available) would have been a more successful 
mutation than Pr3. Nevertheless, the possibility of using the 
binding energy test with mutant PRs was found viable. 

(3) Finally we decided to apply the binding energy test to series 
of mutant PRs binding mutant endogenous substrates. Thus, we 
evaluated the binding energies of drug resistant HIV- 1 proteases 
towards wild type and mutant substrate peptides. It has been 
found that mutations of the cleavage sites are correlated to 
mutations of the protease, often leading to drug resistance. We 
analyzed the K436R and A431V mutations of the NC-pl Gag 
substrate peptide cleavage sequence in relation to a series of single 
mutations and one double mutation of HIV-1 protease (DR set). It 
has been reported [19] that a K436R mutation increases resistance 
to protease inhibitor drugs when combined with 150 V, 184 V and 
184 V/L90M PR mutations, while the A431V mutation results in 
a more efficient PR regardless of other mutations. We expected 
that the more efficient mutant PR - mutant peptide combinations 
were also characterized by stronger binding energies. Table 4 
reports the results of our binding energy test for the DR set. Our 
methodology indicates cleavability for all combinations of mutant 
PRs and mutated NC-pl substrate peptides. While there are some 
fluctuations in the binding energies, no clear pattern arises that 
can be related to the experimental findings. Possibly, the increased 
efficiency of drug resistant mutant proteases towards mutated 
peptides is related to k cat . As previously stated, the effects of this 
constant are not considered by the present approach. Nevertheless, 
the binding energy test was found suitable also for combinations of 
mutant PRs with any peptide. 

Prediction and Analysis of Mutant PRs 

The second aim of this study was to develop a computational 
methodology for the design of a mutant PR. The sought after 



PLOS ONE | www.plosone.org 



3 



May 2014 | Volume 9 | Issue 5 | e95833 



In Silico Prediction of Mutant HIV-1 Proteases 



Table 2. Comparison of WT-PR computed binding energies. 







RosettaDock Energy Function 


FMO Energy Re-evaluation 


Average endogenous 3 


-6 (kT) 


-60 (kcal/mol) 


(Standard deviation) 


(3) (kT) 


(10) (kcal/mol) 


Average all cleavable b 


-5 (kT) 


-43 (kcal/mol) 


(Standard deviation) 


(3) (kT) 


(22) (kcal/mol) 


Average non cleavable c 


-1 (kT) 


-15 (kcal/mol) 


(Standard deviation) 


(4) (kT) 


(28) (kcal/mol) 


U test probability (all cleavable VS non-cleavable) 


1.46-10- 7 


2.2110- 7 


U test probability (only endogenous VS non-cleavable) 


3.49-10- 4 


6.1610- 6 



"Table 1. 

b Table 1 plus Table S5 in File SI. 
c Table S6 in File SI. 

Energies were computed with the standard RosettaDock energy function, as described in [15] and with the FMO re-evaluation. A Mann-Whitney's U test probability was 
evaluated by comparing the binding energies of the set of endogenous peptide against the non-cleavable and the entire set of cleavable peptides against the non- 
cleavable. The FMO based binding energies are more clear in discriminating cleavable and non cleavable peptides than the Rosetta based ones. 
doi:1 0.1 371/joumal.pone.0095833.t002 



enzyme had to be capable of cleaving a new target substrate 
different from the endogenous ones. The obtained mutant PR 
should also be specific for the target peptide sequence compared to 
the endogenous peptides. The chosen sequence for the target 
peptide was HLSF*MAIP, where the * symbol indicates the 
desired cleaving site. The sequence was extracted from that of 
K-casein. Once candidate mutant PRs were obtained, we 
employed the binding energy test to asses the enzymes cleaving 
capabilities. The possibility of an increase in cleaving capability 
towards the target substrate was asserted by differences in binding 
energy between WT-PR and mutant PRs. We evaluated the 
binding energies of mutant PRs in complex with the TF-PR 
peptide, used as a starting template (see the Computational 
Methods section), and the CA-p2 and p2-NC peptides (for selected 
mutant PRs) in order to test the specificity of our mutant PRs. 

The mutant-generating algorithm is described in details in the 
Computational Methods section. Two main strategies (Strategy 1 
and Strategy2) were employed for generating mutant PRs. In 
Strategy 1, the side chains of only the 6 residues previously 



indicated as specificity determining [15] were allowed to change. 
The analysis of the binding energies of the mutant PRs generated 
by Strategy 1 found the enzymes insufficient to perform the desired 
scope. This prompted us to further develop the algorithm. In 
Strategy2, the side chains of 26 residues were allowed to change. 
See the Computational Methods section for further details on the 
residues choice. The analysis of the binding energies of these 
mutant PRs found some of the predicted enzymes to be adequate 
to cleave the desired target sequence. 

Tables S7 and S8 in File SI reports the Strategy 1 mutant PRs 
(M1-M16) and their computed binding energies towards the 
targetpeptide and TF-PR, CA-p2 and p2-NC endogenous 
peptides. Among these mutant PRs, M5 shows the strongest 
binding energy towards the target peptide. However it has to be 
noted that the computed binding energy of M5 towards the TF- 
PR peptide (used as a starting template for all mutant enzymes) is 
also stronger with respect to WT. Possibly M5 is simply a better 
generic binder. To verify this hypothesis we tested M5 as a binder 
also for other two endogenous peptide sequences, CA-p2 and p2- 



Table 3. FMO computed binding energies of HIV-1 protease WT and Pr3 set of mutant PRs. 





PR 


Peptides 














RTp51-RTp66 


poly-Ala 


poly-Arg 


TF-PR 


CA-p2 


p2-NC 


WT-PR 


-68 


-15 


-41 


-62 


-52 


-72 


Single mutant 


A28S 


-65 


-12 


-35 


-41 


-13 


-67 


D30F 


-48 


-1 


0 


-7 


-4 


-43 


G48R 


-66 


-44 


-27 


-55 


-15 


-43 


Double mutant 


A28SD30F 


-44 


-16 


30 


-35 


-22 


-55 


A28SG48R 


-96 


-16 


-54 


-35 


-14 


-60 


D30FG48R 


-76 


3 


18 


-19 


-1 


-54 


Triple mutant 


A28SD30FG48R 


-42 


-21 


12 


-32 


-12 


-63 



Computed binding energies (kcal/mol) of WT and mutant HIV-1 proteases in complex with RTp51-RTp66, poly-alanine, poly-arginine, TF-PR, CA-p2 and p2-NC peptides. 
doi:1 0.1 371 /journal.pone.0095833.t003 
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Table 4. FMO computed binding energies of HIV-1 protease WT and selected drug resistance mutant PRs (DR set). 



PR 



Peptides 





NC-pl WT 


NC-pl™ 6 * 


NC-p1* ,31F 


poly-Ala 


poly-Arg 


WT-PR 


-49 


-70 


-56 


-15 


-41 


D30N 


-29 


-44 


-36 


-15 


-23 


I50L 


-64 


-49 


-54 


-16 


-49 


150V 


-54 


-46 


-52 


-18 


-34 


V82A 


-45 


-54 


-52 


-16 


-38 


184V 


-46 


-75 


-35 


-23 


-38 


184V L90M 


-55 


-67 


-55 


-20 


-46 



Computed binding energies (kcal/mol) of WT-PR and selected drug resistance mutant proteases in complex with NC-p1 as wild type, K436R and A431 V drug resistance 
associated mutant peptides, poly-alanine and poly-arginine peptides. 
doi:1 0.1 371 /journal.pone.0095833.t004 



NC. Compared to WT-PR, M5 has weaker binding energy for the 
former peptide, but equal for the latter. In conclusion, M5 is not 
predicted to be more specific for the target sequence than for the 
endogenous peptides. Moreover, M5 was not direcdy predicted 
through Strategyl, but as a homodimeric derivative of M2, which 
shows only a small improvement in binding of the target peptide. 
All other mutant PRs suggested by Strategyl, M1-M4 and M6- 
M16, were found having a weak binding energy towards the target 
peptide, with some of them showing prominently positive binding 
energies. It can be concluded that Strategyl is unsatisfactory at 
predicting a mutant PR with an increased and specific affinity 
towards the target peptide. This is possibly due to the fact that 
allowing only six residues to change is too strict a condition to 
achieve a suitable mutant PR. 

Thus, we decided to further improve the mutation algorithm by 
including more residues among those that can be changed. The six 
generations of mutant PRs computed through our Strategy2 
mutant algorithm are presented in Table 5. We refer to them as 
generations since at each macro step of the algorithm the lowest in 
energy (as computed with the standard RosettaDock energy 
function) structure was used as starting point for the next step. The 
sixth generation (M23) did not produce any new change with 
respect to the fifth (M22), and the algorithm was consequently 
terminated. For each generation the structure with the lowest 
absolute energy was further optimized. After generation 1 two 
mutant structures were chosen (M17 and M18) since they are very 
close in energy (as evaluated with the RosettaDock energy 
function, data not shown) but relatively different as mutation 
sites. In addition, an extra mutant PR (M24) was generated as 
homodimer of M22. The computed binding energies of the 
Strategy2 mutant PRs (M17-M24) are shown in Table 6. All 
Strategy2 mutant PRs show a binding energy towards the target 
sequence two to four fold stronger than WT-PR, with Ml 7 
displaying the strongest binding energy. However, as for M5, 
binding energies towards the template peptide TF-PR as well as 
CA-p2 and p2-NC are also stronger than WT. Possibly M17 is 
also a good but generic binder. Through the subsequent 
generations of mutant proteases, at last M22 shows a binding 
energy towards the target peptide more than three fold stronger 
than WT, while the computed binding energy towards the natural 
endogenous substrates is weaker than WT. Similar results were 
obtained for its homodimer M24. M22 and M24 show binding 
energies below the cutoff value of —25 kcal/mol, and thus 
represent the best candidates to be further studied experimentally. 



We compared the structures of WT-PR and M24 as optimized 
while binding the target peptide. Figure 2 reports the superim- 
posed backbones of the two enzymes after structure alignment. 
The two computed structures are quite coincident. Hence, it is 
expected that M24 should retain the main structural features of 
the wild type enzyme. We also tried to analyze the choice of 
changed residues. Figure 3 shows that the residues that were 
changed from WT-PR to M24 are disposed all around the bound 
peptide. Figures S3-S14 in File SI compare each residue that 
differs between WT-PR and M24, while bound to the target 
peptide. Although it is evident that the A28S substitution on chain 
A introduces a hydrogen bond between the residue and the side 
chain of the serine in the peptide (Figure S3 in File SI), the other 
substitutions are less easily rationalized. On going study aims at 
elucidating the role of the other residues substitutions. 

It is interesting to note that Strategy2 mutated only 7 out of the 
26 residues that were set as mutable in the method. It is also worth 
noting that of the 7 residues (A28, D30, K45, 150, P81, V82, 184) 
suggested by Strategy2 in the various mutant generations, A28, 
K45, P8 1 are not included in the set of major mutations site of 
HIV-1 protease responsible for drug resistance [31], that is: D30, 
V32, M46, 147, G48, 150, 154, Q58, T74, L76 V82, N83, 184, 
N88, L90. A28, K45, and P81 together with 150 are also not 
included in the specificity determining residues set [15]. However, 
A28 was located by Alvizo et al. for the Pr3 mutant [18]. We 
envision Strategy2 also as a tool to locate those residues most 
involved in binding a given substrate peptide. 

From the analysis of the different PRs, mutant and wild type, 
and their binding energies, it is worth to note that WT-PR has a 
certain affinity with the octa-arginine peptide. Its computed 
binding energy is at the limit to consider the octa-arginine peptide 
as cleavable by WT-PR. Possibly this relatively strong binding is 
given by very few interactions. Accordingly, the single D30F 
change on chain A, that is changing one negatively charged 
residue into an aromatic hydrophobic one, is able to drop the 
computed binding energy to 0, as shown in Table 4. The currently 
going analysis of the residue by residue interactions for the 
modified side chains will give further information also on this 
aspect of the binding of PR. 

Finally, it is interesting to note that the algorithm is not always 
preserving amino acid side chain changes through the generations. 
For example, 184 V on chain A is introduced in M18 and kept in 
Ml 9, M20 and M21, but later reverted. Possibly, an isoleucin in 
position 84 is energetically more favorable, given the other side 
chain changes. 
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Table 5. Strategy2 suggested mutant PRs. 



Mutant ID 


Chain A 


Chain B 


Mutation Scheme 


Notes 


M17 


A28S D30T 


A28S D30T K45M I50L V82F 


F 


After one mutation step 


M18 


A28S D30T I50L P81D V82R 184V 


A28S D30T K45M I50L V82Y 




After one mutation step 


M19 


A28S D30T I50L P81D V82R 184V 


A28S D30T K45A I50L V82Y I84L 




After two mutation steps 


M20 


A28S D30T I50L P81D V82R 184V 


A28S D30T K45D I50L V82Y I84L 




After three mutation steps 


M21 


A28S D30T I50L P81L V82Y 184V 


A28S D30T K45D I50L V82Y 




After four mutation steps 


M22 


A28S D30T I50L P81L V82Y 


A28S D30T K45A I50L V82Y 




After five mutation steps 


M23 


A28S D30T I50L P81L V82Y 


A28S D30T K45A I50L V82Y 




After six mutation steps 


M24 


A28S D30T K45A I50L P81L V82Y 


A28S D30T K45A I50L P81 L V82Y 




Homodimer of M22 



M17-M23 represent the subsequent generations of mutant PRs suggested by Strategy2. All mutant enzymes were generated following Scheme F. 
doi:1 0.1 371 /journal.pone.0095833.t005 



Conclusions 

In the first part of this study we developed a methodology to test 
the cleavability of a peptide by HIV-1 protease (Tables 1 and S6 in 
File SI), solely based on the binding energy between the enzyme 
and the substrate. The methodology can also be applied to mutant 
PRs, Table 3. The technique is based on a PyRosetta algorithm 
generating, iteratively, optimized structures, coupled with an 
energy re-evaluation at a higher level of theory (FMO/PCM 
MP2/6-31G(d)). 

In the second part of this study, the optimization algorithm was 
extended to permit the stochastic change of the side chain of 
selected residues, in order to better bind a given target peptide 
sequence. The selected target peptide was required to be different 
from the endogenous peptides. The desired outcome was a mutant 
PR with stronger and weaker predicted binding energy for the 
target and endogenous peptides, respectively, compared to WT- 
PR. The mutant PRs M22 and M24 generated through Strategy2 
exhibit such desired characteristics (Table 6). We analyzed the 
backbone structure of WT-PR and M24 and found no major 
differences, thus indicating that M24 should retain the general 



structure features of wild type HIV-1 protease. Strategy2 
algorithm is able to predict mutations outside the usual set of 
residues involved in drug resistance, possibly giving an ulterior 
insight into the binding process of HIV-1 protease. 

Ongoing experimental studies will show if and how well M22 
and/or M24 bind and cleave the target sequence. Our current 
experimental and computational studies are also aimed at 
analyzing M24 mutations, residue by residue and in combination, 
and their possible role in binding the target sequence. It is our 
hope that the experimental tests will provide enough information 
to be used to further improve the mutant generating algorithm. If 
the combination of computational algorithm and experimental 
verification is successful it will maybe permit the design of mutant 
PRs specific for any given substrate peptide. 

Computational Methods 

In general, the activity of an enzyme towards two similar 
substrates is regulated by (i) how good the enzyme-substrate 
binding is and (ii) how efficient the enzymatic reaction is. 
Following the dynamic substrate envelope hypothesis [9,10], we 



Table 6. FMO computed binding energies of HIV-1 protease WT and Strategy2 mutant PRs. 





PR 


Peptides 














Target 


poly-Ala 


poly-Arg 


TF-PR 


CA-p2 


p2-NC 


WT-PR 


-9 


-15 


-41 


-62 


-52 


-72 


Gen 1 


M17 


-34 


-13 


-47 


-68 


-82 


-74 


M18 


-24 


-19 


-45 


-82 


-62 


-63 


Gen 2 


M19 


-17 


2 


1 


-67 


-46 


-81 


Gen 3 


M20 


-23 


-2 


-19 


-67 


-33 


-84 


Gen 4 


M21 


-20 


2 


7 


-37 


-32 


-18 


Gen 5 


M22 


-29 


-6 


10 


-42 


-25 


-30 


M24 


-29 


-11 


7 


-44 


-33 


-33 



Computed binding energies (kcal/mol) of WT-PR and Strategy2 mutant proteases in complex with Target, poly-alanine, poly-arginine, TF-PR, CA-p2 and p2-NC peptides. 
doi:1 0.1 371 /joumal.pone.0095833.t006 
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Figure 2. Backbone difference between PyRosetta computed 
structures of WT-PR and M24. The optimized structures of WT-PR 
and M24 binding the target peptide were aligned with respect to their 
a-carbon atoms using PyMol. The backbone of WI24 (red) is almost 
coincident with that of WT-PR (green) with a RMS of 0.227 A. 
doi:1 0.1 371 /journal.pone.0095833.g002 

assume a correlation between the binding of different peptides to 
PR and cleavability of the former. Thus we compute qualitative 
binding energies, on the premise that lower binding energy equals 
better cleavability. 

Binding Energies 

PyRosetta Algorithm. The structure of wild type (WT) HIV- 
1 protease in complex with different octa-peptides was optimized 
using PyRosetta 1.1 [16], a python script-based interface to 
Rosetta [17], and the algorithm depicted in Figure 4. The 
algorithm is based on the flexible peptide-docking algorithm used 
by Chaudhury and Gray [15] to identify in WT HIV-1 protease 
the active-site residues mostly involved in the discrimination of 
cleavable and non-cleavable peptides. Following their algorithm, 
the HIV-1 protease - peptide complexes are represented in atomic 
resolution, as opposed to a coarse-grain representation. With 
respect to the algorithm described in [15], our algorithm (Figure 4) 
has a larger number of cycles (8x4x6=192 compared to 
8x12 = 96), and more 'small' and 'shear' moves for the 
perturbation of both the side chain and the backbone atoms. 
The side chain conformations are further optimized through a 
repacking algorithm [32] and using the extended Dunbrack library 
[33,34]. The moves are applied to all residues of the substrate 
peptide plus a selected number of residues of the protease, with the 
following criterion: all residues inside a 5 A distance from any 
atom of the substrate peptide, plus all the residues reported as 
active by Chaudhury and Gray [15], plus their + 1 neighbours, 
plus if one residue is included on only one chain it is made to be 
included in both. After the moves, an energy minimization step is 
performed, based on the Davidon-Fletcher-Powell method 
[35,36]. Each structure is then accepted or rejected based on a 
Monte Carlo (MC) criterion depending on the standard 
RosettaDock energy function [32,33,37-39]. Along the optimiza- 
tion a temperature gradient was applied, from an initial value of 
kT = 3.0 to 1.0, unless differently stated. 500 decoy structures were 
generated using 5 parallel algorithm runs, each producing 100 
structures. 

The main difference with the algorithm of [15] is that after the 
algorithm produced 500 decoy structures, the lowest in energy is 
chosen and used as a starting structure for another cycle of 
optimization. This process is repeated K times, until convergence. 
It was found that, after at least 5 cycles, the computed 
RosettaDock energy did not change between subsequent cycles 
as soon as all 5 parallel runs of a single cycle produced structures 
with the same energy. Consequently, in order to render as 



automatic as possible the algorithm, the fact that K > 5 and that 
each parallel run produced, as best structure, a decoy with the 
same energy was taken as a mark for convergence. It was found 
that, on average, a value of K = 20 was sufficient. As an example, 
Figure 5 reports the energy of WT-PR bound to TF-PR along the 
optimization. The points at each step corresponds to the 
RosettaDock energy of the lowest in energy decoy out of the 
500 computed at that particular step. Such structure would then 
be used as starting point for the next cycle. At the end of the K 
cycles the lowest in energy decoy is chosen as the PyRosetta 
optimized structure. 

The same algorithm was also used for the optimization of 
mutant HIV-1 proteases (vide infra), the octa-peptides alone, and 
the protease alone as apo-protein. 

The starting structures were prepared from that of HIV-1 
protease in complex with an inhibitor (PDB accession code 1 HXB 
[40]), considered as apo-protein. In order to place the substrate 
peptide, the structure of a D25N deactivated protease in complex 
with the natural substrate peptide p2-NC (PDB accession code 
1KJ7 [9]) was aligned with respect to the backbone atoms of the 
protease (RMS = 0.436 A). The starting structure was then 
composed using the apo-protein from 1HXB and the substrate 
peptide from 1KJ7. All subsequent protease-peptide complexes 
were created starting from this structure and mutating the peptide 
accordingly. See Tables SI, S3 and S4 in File SI for a complete list 
of the considered substrate peptides. Hydrogen atoms were added 
through the program Pymol [41]. 

Further Structures Optimization and Energetic Re- 
evaluation. The position of the hydrogen atoms of each 
PyRosetta generated structure was optimized using Open Babel 
[42] with the MMFF94 [43-47] force field. The energy of each 
structure was finally re-evaluated at the higher level of theory 
'FM02-MP2/6-31G(d)/PCM [1]'. Single point energy evalua- 
tions were carried out using the fragment molecular orbital (FMO) 
approximation [48,49], as implemented in GAMESS [50]. Each 
FMO calculation was carried out at the MP2 level of theory [51] 
with the 6-31 G(d) basis set [52,53] and the Polarazible 
Continuum Model (PCM) approximation [54,55]. Pairs of 
fragments separated by more than two van der Waals radii were 
calculated using a Coulomb expression for the interaction energy 
and ignoring correlation effects (RESDIM = 2.0 RCORSD = 2.0 
in $FMO). The input files for the FMO calculations were prepared 
using the program FRAGIT [56]. 

Binding Energies Evaluation. The re-evaluated energy of 
every optimized structure was used to compute the binding energy 
of PR with different substrate peptides. The binding energy 
of HIV-1 protease (wild type or mutated) and a peptide was 
evaluated with equation (1), where E 'complex is the energy of the 
complex, Eapo the energy of the protease optimized as apo- 
protein, Ep ep the energy of the optimized peptide. 

Emnd = ^Complex ~ [EaPO + Ep ep j ( 1 ) 

These binding energies can not be directly compared to 
experimental values, for which a much more complex and 
accurate methodology is required [57]. These energies were used 
only to qualitatively compare different PR - peptide combinations. 

Mutation Algorithm 

A similar procedure as that described in Figure 4 was used to 
produce mutant HIV-1 proteases, possibly capable of cleaving a 
given peptide different from the endogenous substrate peptides. 
The general idea was to 'expose' the protease to a different peptide 
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Figure 3. Spatial disposition of the residues changed by Strategy2. The six residues of chain A (top) and 6 residues of chain B (bottom) are 
highlighted in ball-and-sticks. The reported structure (as semi-transparent cartoon) is that of WT-PR optimized when binding the target peptide (only 
the backbone is shown in sticks). Figures S3-S14 in File S1 report the full residue by residue changes. A movie showing the three dimensional 
structure is included as Supporting Material in File S1. 
doi:1 0.1 371 /journal.pone.0095833.g003 



and allow some residues to change in order to accommodate it 
better. A target octa-peptide was chosen: HLSF*MAIP, where the 
* symbol indicates the desired cleaving site. The peptide sequence 
was extracted from that of K-casein. 

The assumption behind the algorithm is that lowering the 
energy of the PR - peptide complex by changing the side chains of 
selected residues would decrease also the binding energy, thus 
increasing the cleavability. 

Two different methodologies were designed to predict mutant 
PRs, Strategy 1 and Strategy2. The Strategy 1 mutation algorithm 
is depicted in Figure 6. Each optimization step corresponds to the 
algorithm of Figure 4. In the mutation steps (also based on the 



previous algorithm), the Dunbrack library of rotamers includes all 
rotamers of all amino acids, but only for a selected number of 
residues. The six specificity determining residues, as found by [15], 
are chosen to be altered. In other words, during the mutation step, 
whenever one of the alterable residues is being optimized, the 
random choice of a test rotamer is among all possible amino acids. 
In Scheme A alterations are allowed on all 6 residues on both 
chains, for a total of 12 alterable residues. Thus, side-chain 
perturbation and repacking rotamer choice is performed randomly 
selecting among 12x20 = 240 possible amino acids. In Scheme B 
only alterations on L76 and V82 of Chain A and D30, 147, G48, 
and 184 of Chain B are allowed, for a total of 6 alterable residues. 
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I Substrate peptide sequence I 
JL 



x6 



x4 



Starting structure k- 
i 

\p, x perturbations 
31 



Energy minimization 
(Davidon- Fletcher- Powell) 



MC criterion 
31 



x8 



Side-chain packing 



x K 
times 



Out of 500 
decoy structures 
the lowest in 
energy is 
chosen. 



MC criterion 
i 

' : ' decoy 



ca. 24 hours 
on 5 cpus 



Figure 4. PyRosetta based optimization algorithm. tj>, / 

represent perturbations applied to both backbone and side chain 
dihedral angles. MC criterion stands for a Monte Carlo based check of 
decoy structures. 

doi:1 0.1 371 /journal.pone.0095833.g004 

In this case, side-chain perturbation and repacking rotamer choice 
is performed with a random selection among 6x20 = 120 possible 
amino acids. Each mutation step took ca. 40 hours on 5 cpus to 
produce 500 decoys. The lowest energy decoy is then chosen as 
starting structure for the next step. The energy of the structure is 
evaluated with the standard RosettaDock energy function. The 
residue reference energy part of the energy function [32] takes into 
account also the differences between different amino acids. In 
other words, energy differences between two mutant structures 
originates solely from different side chain interactions rather than 
from a different number of atoms. 

Both the mutation and the optimization steps were repeated K 
and K times, respectively. The mutation cycles are considered 
converged once two following cycles do not introduce new 
mutations. Different values of K and K were found necessary to 
reach convergence. After a series of mutation cycles (K > 8), a 
series of optimization cycles was performed (K>8), followed by 
another usually shorter mutation cycle (K < 3) and finally a short 
optimization cycle (K<3). 

Among the naturally cleaved peptides, TF-PR (sequence 
SFNF*PQIT) was chosen as a starting substrate peptide, since it 
is the most similar, in terms of conserved residues, to the target 
peptide (sequence HLSF*MAIP). Consequently, the optimized 
structure of WT protease in complex with the TF-PR peptide was 
chosen as starting template. The substrate peptide sequence was 
altered one amino acid at the time, as reported in Table S10 in 
File SI. After each peptide alteration, a series of protease mutation 
and optimization cycles were performed. Once convergence was 
reached, a new peptide single amino acid change was introduced 
and the procedure repeated. Different mutant PRs were obtained 
from different runs by changing a few parameters, e.g. the initial 
temperature of the simulation. These parameters are specified in 
Table S7 in File SI. Some mutant PRs were also produced by 
'exposing' the protease directly to the target peptide without prior 
intermediates (mutation Scheme F). This last process required a 
higher number of K cycles (K > 15), but without having to cycle 
through one substrate peptide residue at the time. 

All mutant PRs obtained through Strategy 1 were heterodimers. 
By simply equalizing alterations on both chains a number of extra 
homodimer mutant PRs were also obtained. These structures were 
subsequently optimized as previously described. 

In Strategy2 the number of residues allowed to change was 
increased in order to include all amino acids residing inside a 3 A 
radius from the TF-PR peptide. In other words, we chose those 




6 8 10 

Optimization Step ( K) 

Figure 5. Optimization algorithm convergence. Example of 
energy convergence during the various macro cycles of the optimiza- 
tion algorithm for WT-PR in complex with TF-PR peptide. Each point 
along the graph corresponds to the energy (computed with the 
RosettaDock energy function) of the lowest in energy decoy out of 500 
produced during each of the K steps. 
doi:10.1371/joumal.pone.0095833.g005 

residues with at least one atom that is distant at most 3 A from any 
atom of the substrate peptide. The specificity determining residues 
were also included in the set of alterable amino acids, if not already 
present. The residues Asp25, Thr26 and Gly27 of both chains 
were excluded from the set, since they represent the catalytic triad 
[5]. The full set of 26 residues is reported in Table S9 in File SI. 
Thus, side-chain perturbation and repacking rotamer choice is 
performed randomly selecting among 26 x20 = 520 possible amino 
acids. The mutant PRs were generated using the target peptide 
directly (Scheme F). Each mutation step took a bit more than 3 
days on 5 cpus to produce 500 decoy structures. An initial 
temperature of 9 kT was usued. K = 6 mutation cycles were 
performed. The lowest in energy decoy after each mutation step 
was subsequently optimized (two after the first step). The sixth 
mutation step did not introduce any new mutation in PR and the 
mutation cycle was stopped. 



Substrate peptide starting 
sequence 



WT structure 
1 



Mutate 1 aa of substrate 
towards target sequence 
I 



xl 



-H Protease mutation steps 
1 

Protease optimization steps 



x K' times 



x K times 



Mutated protease in complex 
with target peptide 



Figure 6. PyRosetta based mutation algorithm. The optimization 
step is the algorithm presented in Figure 4. In the mutation step the 
side chain perturbation for the six specificity determining residues is 
among all possible rotamer of all 20 amino acids. 
doi:10.1371/joumal.pone.0095833.g006 



PLOS ONE | www.plosone.org 



9 



May 2014 | Volume 9 | Issue 5 | e95833 



In Silico Prediction of Mutant HIV-1 Proteases 



Also the mutant PRs obtained through Strategy2 were 
heterodimers. Only the homodimer of the final mutant PR was 
considered, see Table 5. 

Supporting Information 

File SI Tables SI— Sll, Figures S1-S14, a movie showing the 
three dimensional structure of WT-PR bound to the target 
peptide, with highlighted the residues that are changed in M24. 
(PDF) 
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